7 research outputs found

    FPSNET: An Architecture for Neural-Network-Based Feature Point Extraction for SLAM

    No full text
    The hardware architecture of a deep-neural-network-based feature point extraction method is proposed for the simultaneous localization and mapping (SLAM) in robotic applications, which is named the Feature Point based SLAM Network (FPSNET). Some key techniques are deployed to improve the hardware and power efficiency. The data path is devised to reduce overall off-chip memory accesses. The intermediate data and partial sums resulting in the convolution process are stored in available on-chip memories, and optimized hardware is employed to compute the one-point activation function. Meanwhile, address generation units are used to avoid data overlapping in memories. The proposed FPSNET has been designed in 65 nm CMOS technology with a core area of 8.3 mm2. This work reduces the memory overhead by 50% compared to traditional data storage for activation and overall by 35% for on-chip memories. The synthesis and simulation results show that it achieved a 2.0× higher performance compared with the previous design while achieving a power efficiency of 1.0 TOPS/W, which is 2.4× better than previous work. Compared to other ASIC designs with close peak throughput and power efficiency performance, the presented FPSNET has the smallest chip area (at least 42.4% reduction)

    FPSNET: An Architecture for Neural-Network-Based Feature Point Extraction for SLAM

    No full text
    The hardware architecture of a deep-neural-network-based feature point extraction method is proposed for the simultaneous localization and mapping (SLAM) in robotic applications, which is named the Feature Point based SLAM Network (FPSNET). Some key techniques are deployed to improve the hardware and power efficiency. The data path is devised to reduce overall off-chip memory accesses. The intermediate data and partial sums resulting in the convolution process are stored in available on-chip memories, and optimized hardware is employed to compute the one-point activation function. Meanwhile, address generation units are used to avoid data overlapping in memories. The proposed FPSNET has been designed in 65 nm CMOS technology with a core area of 8.3 mm2. This work reduces the memory overhead by 50% compared to traditional data storage for activation and overall by 35% for on-chip memories. The synthesis and simulation results show that it achieved a 2.0× higher performance compared with the previous design while achieving a power efficiency of 1.0 TOPS/W, which is 2.4× better than previous work. Compared to other ASIC designs with close peak throughput and power efficiency performance, the presented FPSNET has the smallest chip area (at least 42.4% reduction)

    HW-ADAM: FPGA-Based Accelerator for Adaptive Moment Estimation

    No full text
    The selection of the optimizer is critical for convergence in the field of on-chip training. As one second moment optimizer, adaptive moment estimation (ADAM) shows a significant advantage compared with non-moment optimizers such as stochastic gradient descent (SGD) and first-moment optimizers such as Momentum. However, ADAM is hard to implement on hardware due to the computationally intensive operations, including square, root extraction, and division. This work proposed Hardware-ADAM (HW-ADAM), an efficient fixed-point accelerator for ADAM highlighting hardware-oriented mathematical optimizations. HW-ADAM has two designs: Efficient-ADAM (E-ADAM) unit reduced the hardware resource consumption by around 90% compared with the related work. E-ADAM achieved a throughput of 2.89 MUOP/s (Million Updating Operation per Second), which is 2.8× of the original ADAM. Fast-ADAM (F-ADAM) unit reduced 91.5% flip-flops, 65.7% look-up tables, and 50% DSPs compared with the related work. The F-ADAM unit achieved a throughput of 16.7 MUOP/s, which is 16.4× of the original ADAM

    Power Efficient Tiny Yolo CNN Using Reduced Hardware Resources Based on Booth Multiplier and WALLACE Tree Adders

    No full text
    Convolutional Neural Network (CNN) has attained high accuracy and it has been widely employed in image recognition tasks. In recent times, deep learning-based modern applications are evolving and it poses a challenge in research and development of hardware implementation. Therefore, hardware optimization for efficient accelerator design of CNN remains a challenging task. A key component of the accelerator design is a processing element (PE) that implements the convolution operation. To reduce the amount of hardware resources and power consumption, this article provides a new processing element design as an alternate solution for hardware implementation. Modified BOOTH encoding (MBE) multiplier and WALLACE tree-based adders are proposed to replace bulky MAC units and typical adder tree respectively. The proposed CNN accelerator design is tested on Zynq-706 FPGA board which achieves a throughput of 87.03 GOP/s for Tiny-YOLO-v2 architecture. The proposed design allows to reduce hardware costs by 24.5% achieving a power efficiency of 61.64 GOP/s/W that outperforms the previous designs

    Modeling of electrical transport in YBCO single layer thin films using flux motion model

    No full text
    The electrical transport properties of YBCO single layers thin film have been investigated using different physical techniques. For the purpose, the physical properties are probed numerically with help of simulation modelling. The physical transport properties were also estimated with temperature and magnetic fields limits using thermally-activated flux flow model with some modifications. The result of present simulation modelling indicated that the magnitude of activation energy depends on temperature and magnetic field. The simulations revealed thickness dependent physical transport properties including electrical and magnetic properties of deposited YBCO single layers thin films. Furthermore, it shows the temperature dependence of the pinning energy. In the nutshell, the result can be used to improve the Superconducting Properties (Tc) of the YBCO single layers thin films
    corecore